Skip to content

HDDS-15193. Move the "atomic key creation" logic from output stream to S3 endpoints#10202

Open
peterxcli wants to merge 8 commits into
apache:masterfrom
peterxcli:codex/HDDS-15193-remove-old-atomic-key-creation
Open

HDDS-15193. Move the "atomic key creation" logic from output stream to S3 endpoints#10202
peterxcli wants to merge 8 commits into
apache:masterfrom
peterxcli:codex/HDDS-15193-remove-old-atomic-key-creation

Conversation

@peterxcli
Copy link
Copy Markdown
Member

@peterxcli peterxcli commented May 6, 2026

What changes were proposed in this pull request?

this is a no-op change

the "atomic key creation" logic in Ozone client output stream is actually only for s3g to do the length check.

removes the "atomic key creation" logic from the Ozone client output stream classes and shifts the responsibility for validating S3 object upload content length from the client layer to the S3 gateway layer. The main effect is that S3-specific length checks are now enforced in the S3 gateway, not in the general Ozone client code, leading to a cleaner separation of concerns and more maintainable code.

the name "atomic key creation" is also misleading people to think it's related to atomicity, but it's actually a integrity check for the written block data for s3g. If users want the real atomicity, they must use conditional request:

relate to: #5524

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15193

How was this patch tested?

(Please explain how this patch was tested. Ex: unit tests, manual tests, workflow run on the fork git repo.)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this.)

@peterxcli peterxcli changed the title HDDS-15193. Remove old atomic key creation HDDS-15193. Remove old S3 atomic key creation May 6, 2026
@peterxcli peterxcli marked this pull request as ready for review May 6, 2026 18:28
@peterxcli peterxcli requested a review from ChenSammi May 8, 2026 06:27
@peterxcli
Copy link
Copy Markdown
Member Author

cc @xichen01, @ivandika3 Could you please take a look? Thanks!

@chungen0126
Copy link
Copy Markdown
Contributor

Thanks @peterxcli for the patch. However, I went back to review the PR for these lines, and it appears they are not related to conditional requests. They were actually added to fix a commit overwrite issue that occurs when multiple S3G instances are writing the same key.

I am concerned that removing them, as I don't think that specific problem has been resolved yet. Could you please provide a bit more detail on why these lines need to be deleted? Thanks!

Signed-off-by: peterxcli <peterxcli@gmail.com>
@peterxcli peterxcli changed the title HDDS-15193. Remove old S3 atomic key creation HDDS-15193. refactor old S3 atomic key creation May 8, 2026
@peterxcli peterxcli changed the title HDDS-15193. refactor old S3 atomic key creation HDDS-15193. Keep length validation logic only in s3g module May 8, 2026
@peterxcli peterxcli changed the title HDDS-15193. Keep length validation logic only in s3g module HDDS-15193. Movesthe "atomic key creation" logic from client output to S3 Endpoint May 8, 2026
@peterxcli peterxcli changed the title HDDS-15193. Movesthe "atomic key creation" logic from client output to S3 Endpoint HDDS-15193. Movesthe "atomic key creation" logic from output stream to S3 Endpoint May 8, 2026
@peterxcli peterxcli changed the title HDDS-15193. Movesthe "atomic key creation" logic from output stream to S3 Endpoint HDDS-15193. Move the "atomic key creation" logic from output stream to S3 endpoints May 8, 2026
Copy link
Copy Markdown
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @peterxcli , overall LGTM. Using the new pre-commit hook is a good idea. Left a few nits.

peterxcli added 3 commits May 9, 2026 17:44
…date ClientProtocolStub accordingly.

Signed-off-by: peterxcli <peterxcli@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
@peterxcli
Copy link
Copy Markdown
Member Author

I cherry-picked the refactor/fix for the test failure to #10224.

the error log is:

2026-05-09 07:21:08,033 [qtp1907241392-109] WARN server.HttpChannel: /bucket-qiyxgiosny/ozone-test-5105985444/ecmultipartKey32
javax.servlet.ServletException: javax.servlet.ServletException: java.lang.NullPointerException: Cannot invoke "org.apache.hadoop.ozone.client.io.KeyDataStreamOutput.setPreCommits(java.util.List)" because the return value of "org.apache.hadoop.ozone.client.io.OzoneDataStreamOutput.getKeyDataStreamOutput()" is null

The root cause of the EC mpu failure is RpcClient#createMultipartStreamKey detected EC and fall back to OzoneOutputStream backed by KeyOutputStream, so getKeyDataStreamOutput got NPE

@chungen0126
Copy link
Copy Markdown
Contributor

chungen0126 commented May 9, 2026

Thanks @peterxcli for working on this. However, I have some concerns regarding the feasibility of placing this atomic validation within the pre-commit hook. Here are my thoughts:

I believe the original issue stems from scenarios where a timeout occurs due to network instability or other unknown factors(on client side or datanode side), leading to a wrong commit.

Consider this flow: If an exception is thrown during the IO copy process because of timeout, it will jump to the catch block. At this point, the content length validation has not yet been added into the pre-commit hook, meaning the incorrect commit would still proceed.

The original atomic validation was implemented directly within the close() method, ensuring it would be executed regardless of when an exception occurred at any time. Now, it has been moved to a pre-commit hook which was added later. If an exception interrupts the process before this hook is triggered, the validation will be skipped, and an incomplete key might still be committed.

You can try testing one of the specific scenarios provided by @xichen01 for this issue.

Another similar problem:
Ozone can generate an incomplete key.

Reproduce Step:
Ozone S3
Upload the key and interrupt (ctrl + c) the upload before it completes.

An incomplete key will commit

[root@VM-8-3-centos ~]$ aws configure set default.s3.multipart_threshold 1GB
[root@VM-8-3-centos ~]$ aws s3 --endpoint-url http://localhost:9878 cp ~/500M.img s3://bucket1/500M.img

^Ccancelled: ctrl-c received
[root@VM-8-3-centos /tmp]$ aws s3 ls s3://bucket1/ --endpoint-url http://localhost:9878
2023-10-24 15:48:13 121208832 500M.img
AWS S3
Upload the key and interrupt (ctrl + c) the upload before it completes.

the incomplete key will not be committed

[root@VM-8-3-centos ~]$ aws configure set default.s3.multipart_threshold 1GB
[root@VM-8-3-centos ~]$ aws s3 cp ~/500M.img s3://bucket1/500M.img
^Ccancelled: ctrl-c received
[root@VM-8-3-centos ~]$ aws s3 ls s3://bucket1/500M.img
// nothing output

@chungen0126
Copy link
Copy Markdown
Contributor

cc @xichen01 As the original implementation by you. Also cc @ChenSammi @kerneltime , as you both reviewed the previous PR. I would love to hear your thoughts on this.

@ChenSammi
Copy link
Copy Markdown
Contributor

ChenSammi commented May 12, 2026

@peterxcli , I think this move to "atomic key creation" logic cannot replace #5524. The key problem is writeToStreamOutput() will throw exception before validateContentLength is set to preCommits. You can double check the stack I shared under https://issues.apache.org/jira/browse/HDDS-9526 comments.

@peterxcli
Copy link
Copy Markdown
Member Author

peterxcli commented May 12, 2026

image

thanks, you're right, checking if the length precommit can be moved before the writeToStreamOutput

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants